Categories

Versions

Embeddings (FastEmbed) (Generative Models)

Synopsis

Calculates embeddings from a text column and stores them as new column

Description

This operator calculates an embedding (a vector in a high-dimensional space) from a text column. The resulting vectors are stored in a new column. These embeddings can be used as input to machine learning algorithms but also as input to vector stores for performing similarity-based retrieval. With the default setting this operator produces embeddings with 768 dimensions. Please refer to the documentation to learn more about the vector size for other models: https://qdrant.github.io/fastembed/examples/Supported_Models/ Please also note that all embedding operators write the number of dimensions as log entry, too.

Input

  • data (Data Table)

    The data containing the text column for which the embedding should be added.

Output

  • data (Data Table)

    The resulting data set with the new embedding column.

Parameters

  • model Identifies the model which should be used for calculating the embedding. Range:
  • input The text column for which the embeddings should be calculated. Range:
  • name The name of the column for storing the calculated embeddings. Range:
  • conda_environment The conda environment used for this task. Please refer to the extension documentation for additional details on this and on version requirements for Python and all used packages in this environment. Range:

Tutorial Processes

Calculate embeddings with FastEmbed

This process takes some texts as input and adds a new column with embeddings (a vector in a high-dimensional space) as a new column.